[UR][CUDA] Fix race condition in CUDA stream creation#15100
Merged
steffenlarsen merged 1 commit intointel:syclfrom Aug 20, 2024
Merged
[UR][CUDA] Fix race condition in CUDA stream creation#15100steffenlarsen merged 1 commit intointel:syclfrom
steffenlarsen merged 1 commit intointel:syclfrom
Conversation
rodburns
reviewed
Aug 16, 2024
af58982 to
4d07d50
Compare
4d07d50 to
374453c
Compare
374453c to
a399862
Compare
Contributor
This was referenced Aug 19, 2024
omarahmed1111
approved these changes
Aug 19, 2024
Contributor
|
@intel/llvm-gatekeepers Please merge, Thanks! |
npmiller
added a commit
to npmiller/llvm
that referenced
this pull request
Mar 26, 2025
The CUDA and HIP adapters are both using a nearly identical complicated queue that handles creating an out-of-order UR queue from in-order CUDA/HIP streams. This patch extracts all of the queue logic into a separate templated class that can be used by both adapters. Beyond removing a lot of duplicated code, it also makes it a lot easier to maintain. There was a few functional differences between the queues in both adapters, but mostly due to fixes done in the CUDA adapter that were not ported to the HIP adapter. There might be more but I found at least one race condition (intel#15100) and one performance issue (intel#6333) that weren't fixed in the HIP adapter. This patch uses the CUDA version of the queue as a base for the generic queue, and will thus fix for HIP the race condition and performance issue mentioned above. This code is quite complex, so this patch also aimed to minimize any other changes beyond the structural changes needed to share the code. However it did do the following changes in the two adapters: CUDA: * Rename `ur_stream_guard_` to `ur_stream_guard` * Rename `getNextEventID` to `getNextEventId` * Remove duplicate `get_device` getter, use `getDevice` instead HIP: * Fix queue finish so it doesn't fail when no streams need to be synchronized
sarnex
pushed a commit
that referenced
this pull request
Apr 2, 2025
The CUDA and HIP adapters are both using a nearly identical complicated queue that handles creating an out-of-order UR queue from in-order CUDA/HIP streams. This patch extracts all of the queue logic into a separate templated class that can be used by both adapters. Beyond removing a lot of duplicated code, it also makes it a lot easier to maintain. There was a few functional differences between the queues in both adapters, but mostly due to fixes done in the CUDA adapter that were not ported to the HIP adapter. There might be more but I found at least one race condition (#15100) and one performance issue (#6333) that weren't fixed in the HIP adapter. This patch uses the CUDA version of the queue as a base for the generic queue, and will thus fix for HIP the race condition and performance issue mentioned above. This code is quite complex, so this patch also aimed to minimize any other changes beyond the structural changes needed to share the code. However it did do the following changes in the two adapters: `stream_queue.hpp`: * Remove `urDeviceRetain/Release`: essentially a no-op CUDA: * Rename `ur_stream_guard_` to `ur_stream_guard` * Rename `getNextEventID` to `getNextEventId` * Remove duplicate `get_device` getter, use `getDevice` instead HIP: * Fix queue finish so it doesn't fail when no streams need to be synchronized
kbenzie
pushed a commit
to oneapi-src/unified-runtime
that referenced
this pull request
Apr 3, 2025
The CUDA and HIP adapters are both using a nearly identical complicated queue that handles creating an out-of-order UR queue from in-order CUDA/HIP streams. This patch extracts all of the queue logic into a separate templated class that can be used by both adapters. Beyond removing a lot of duplicated code, it also makes it a lot easier to maintain. There was a few functional differences between the queues in both adapters, but mostly due to fixes done in the CUDA adapter that were not ported to the HIP adapter. There might be more but I found at least one race condition (intel/llvm#15100) and one performance issue (intel/llvm#6333) that weren't fixed in the HIP adapter. This patch uses the CUDA version of the queue as a base for the generic queue, and will thus fix for HIP the race condition and performance issue mentioned above. This code is quite complex, so this patch also aimed to minimize any other changes beyond the structural changes needed to share the code. However it did do the following changes in the two adapters: `stream_queue.hpp`: * Remove `urDeviceRetain/Release`: essentially a no-op CUDA: * Rename `ur_stream_guard_` to `ur_stream_guard` * Rename `getNextEventID` to `getNextEventId` * Remove duplicate `get_device` getter, use `getDevice` instead HIP: * Fix queue finish so it doesn't fail when no streams need to be synchronized
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix race condition in CUDA stream creation in the UR CUDA adapter
See oneapi-src/unified-runtime#1984